Lecture 14 : Bandits with Budget Constraints

نویسنده

  • Zhipeng Liu
چکیده

1 Problem definition In the regular Multi-armed Bandit problem, we pull an arm It at each round t and a reward rt is collected, which depend on the bandit It. Now suppose that each round after pulling the arm, a cost ct is also incurred. When the total cost till time t surpass a given level B, the algorithm stops. This setting of problem is called Bandits with constraints. The formal description is following. At time t, • pull an arm It = i, observe reward rt ∈ [0, 1] and cost ct ∈ [0,1]. • Given It = i, (rt, ct) ∼ Di, where Di is a joint distribution dependent on arm i, denote E[rt|It = i] = μi, E[ctj |It = i] = Cij . • Stop when any budget constraint is violated. • Goal is to maximize T ∑

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear Contextual Bandits with Knapsacks

We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesn’t exceed the budget for each ...

متن کامل

Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

We study contextual bandits with budget and time constraints under discrete contexts, referred to as constrained contextual bandits. The budget and time constraints significantly increase the complexity of exploration-exploitation tradeoff because they introduce coupling among contexts. Such coupling effects make it difficult to obtain oracle solutions that assume known statistics of bandits. T...

متن کامل

Infinitely Many-Armed Bandits with Budget Constraints

We study the infinitely many-armed bandit problem with budget constraints, where the number of arms can be infinite and much larger than the number of possible experiments. The player aims at maximizing his/her total expected reward under a budget constraint B for the cost of pulling arms. We introduce a weak stochastic assumption on the ratio of expected-reward to expected-cost of a newly pull...

متن کامل

Resourceful Contextual Bandits

We study contextual bandits with ancillary constraints on resources, which are common in realworld applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and ...

متن کامل

Budgeted Bandit Problems with Continuous Random Costs

We study the budgeted bandit problem, where each arm is associated with both a reward and a cost. In a budgeted bandit problem, the objective is to design an arm pulling algorithm in order to maximize the total reward before the budget runs out. In this work, we study both multi-armed bandits and linear bandits, and focus on the setting with continuous random costs. We propose an upper confiden...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016